pip install kaggle
pip install opendatasets
Requirement already satisfied: kaggle in c:\users\isabe\anaconda\lib\site-packages (1.5.13) Requirement already satisfied: tqdm in c:\users\isabe\anaconda\lib\site-packages (from kaggle) (4.64.0) Requirement already satisfied: six>=1.10 in c:\users\isabe\anaconda\lib\site-packages (from kaggle) (1.16.0) Requirement already satisfied: python-dateutil in c:\users\isabe\anaconda\lib\site-packages (from kaggle) (2.8.2) Requirement already satisfied: python-slugify in c:\users\isabe\anaconda\lib\site-packages (from kaggle) (5.0.2) Requirement already satisfied: urllib3 in c:\users\isabe\anaconda\lib\site-packages (from kaggle) (1.26.9) Requirement already satisfied: certifi in c:\users\isabe\anaconda\lib\site-packages (from kaggle) (2021.10.8) Requirement already satisfied: requests in c:\users\isabe\anaconda\lib\site-packages (from kaggle) (2.27.1) Requirement already satisfied: text-unidecode>=1.3 in c:\users\isabe\anaconda\lib\site-packages (from python-slugify->kaggle) (1.3) Requirement already satisfied: idna<4,>=2.5 in c:\users\isabe\anaconda\lib\site-packages (from requests->kaggle) (3.3) Requirement already satisfied: charset-normalizer~=2.0.0 in c:\users\isabe\anaconda\lib\site-packages (from requests->kaggle) (2.0.4) Requirement already satisfied: colorama in c:\users\isabe\anaconda\lib\site-packages (from tqdm->kaggle) (0.4.4) Note: you may need to restart the kernel to use updated packages.
import opendatasets as od
import pandas
od.download(
"https://www.kaggle.com/datasets/arezalo/customer-dataset")
Skipping, found downloaded files in ".\customer-dataset" (use force=True to force download)
| Variable Name | Description | Sample Data |
|---|---|---|
| CUST_ID | Credit card holder ID | C10001; C10002; ... |
| BALANCE | Remaining account balance available for purchases | 40.900749; 3202.467416; ... |
| BALANCE_FREQUENCY | Balance update frequency (between 0 and 1) 1 = frequently updated 0 = not frequently updated |
0.818182; 0.909091; ... |
| PURCHASES | Account purchases amount | 95.4; 773.17; ... |
| ONEOFF_PURCHASES | Maximum purchase amount in single transaction | 1499; 16; ... |
| INSTALLMENTS_PURCHASES | Amount purchase in installment | 95.4; 1333.28; ... |
| CASH_ADVANCE | The user's advance payment in cash | 6442.945483; 205.788017; ... |
| PURCHASES_FREQUENCY | Frequency of purchases made on a regular basis (between 0 and 1) 1 = frequently purchased 0 = not frequently purchased |
0.166667; 0.083333; ... |
| ONEOFF_PURCHASES_FREQUENCY | Frequency of purchases made in single transaction (between 0 and 1) 1 = frequently purchased 0 = not frequently purchased |
0.083333; 0.083333; ... |
| PURCHASES_INSTALLMENTS_FREQUENCY | Frequency of done purchases in installments (between 0 and 1) 1 = frequently done 0 = not frequently done |
0.083333; 0.583333; ... |
| CASH_ADVANCE_FREQUENCY | Frequency of cash in advance | 0.25; 0.083333; ... |
| CASH_ADVANCE_TRX | "Cash in advance" total transactions | 0; 4; ... |
| PURCHASES_TRX | Purchase total transactions | 2; 12; ... |
| CREDIT_LIMIT | Credit card limit of an user | 1000; 7000; ... |
| PAYMENTS | Total amount paid by the user | 201.802084; 4103.032597; ... |
| MINIMUM_PAYMENTS | Minimum payment amount made by user | 139.509787; 1072.340217; ... |
| PRC_FULL_PAYMENT | Percent of total charge paid by the user | 0; 0.222222; ... |
| TENURE | Credit card tenure of an user | 12; 8; ... |
!pip install yellowbrick
!pip install pywaffle
Requirement already satisfied: yellowbrick in c:\users\isabe\anaconda\lib\site-packages (1.5) Requirement already satisfied: cycler>=0.10.0 in c:\users\isabe\anaconda\lib\site-packages (from yellowbrick) (0.11.0) Requirement already satisfied: scikit-learn>=1.0.0 in c:\users\isabe\anaconda\lib\site-packages (from yellowbrick) (1.0.2) Requirement already satisfied: numpy>=1.16.0 in c:\users\isabe\anaconda\lib\site-packages (from yellowbrick) (1.21.5) Requirement already satisfied: matplotlib!=3.0.0,>=2.0.2 in c:\users\isabe\anaconda\lib\site-packages (from yellowbrick) (3.5.1) Requirement already satisfied: scipy>=1.0.0 in c:\users\isabe\anaconda\lib\site-packages (from yellowbrick) (1.7.3) Requirement already satisfied: python-dateutil>=2.7 in c:\users\isabe\anaconda\lib\site-packages (from matplotlib!=3.0.0,>=2.0.2->yellowbrick) (2.8.2) Requirement already satisfied: fonttools>=4.22.0 in c:\users\isabe\anaconda\lib\site-packages (from matplotlib!=3.0.0,>=2.0.2->yellowbrick) (4.25.0) Requirement already satisfied: pillow>=6.2.0 in c:\users\isabe\anaconda\lib\site-packages (from matplotlib!=3.0.0,>=2.0.2->yellowbrick) (9.0.1) Requirement already satisfied: packaging>=20.0 in c:\users\isabe\anaconda\lib\site-packages (from matplotlib!=3.0.0,>=2.0.2->yellowbrick) (21.3) Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\isabe\anaconda\lib\site-packages (from matplotlib!=3.0.0,>=2.0.2->yellowbrick) (1.3.2) Requirement already satisfied: pyparsing>=2.2.1 in c:\users\isabe\anaconda\lib\site-packages (from matplotlib!=3.0.0,>=2.0.2->yellowbrick) (3.0.4) Requirement already satisfied: six>=1.5 in c:\users\isabe\anaconda\lib\site-packages (from python-dateutil>=2.7->matplotlib!=3.0.0,>=2.0.2->yellowbrick) (1.16.0) Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\isabe\anaconda\lib\site-packages (from scikit-learn>=1.0.0->yellowbrick) (2.2.0) Requirement already satisfied: joblib>=0.11 in c:\users\isabe\anaconda\lib\site-packages (from scikit-learn>=1.0.0->yellowbrick) (1.1.0) Requirement already satisfied: pywaffle in c:\users\isabe\anaconda\lib\site-packages (1.1.0) Requirement already satisfied: fontawesomefree in c:\users\isabe\anaconda\lib\site-packages (from pywaffle) (6.4.0) Requirement already satisfied: matplotlib in c:\users\isabe\anaconda\lib\site-packages (from pywaffle) (3.5.1) Requirement already satisfied: numpy>=1.17 in c:\users\isabe\anaconda\lib\site-packages (from matplotlib->pywaffle) (1.21.5) Requirement already satisfied: pyparsing>=2.2.1 in c:\users\isabe\anaconda\lib\site-packages (from matplotlib->pywaffle) (3.0.4) Requirement already satisfied: fonttools>=4.22.0 in c:\users\isabe\anaconda\lib\site-packages (from matplotlib->pywaffle) (4.25.0) Requirement already satisfied: pillow>=6.2.0 in c:\users\isabe\anaconda\lib\site-packages (from matplotlib->pywaffle) (9.0.1) Requirement already satisfied: python-dateutil>=2.7 in c:\users\isabe\anaconda\lib\site-packages (from matplotlib->pywaffle) (2.8.2) Requirement already satisfied: packaging>=20.0 in c:\users\isabe\anaconda\lib\site-packages (from matplotlib->pywaffle) (21.3) Requirement already satisfied: cycler>=0.10 in c:\users\isabe\anaconda\lib\site-packages (from matplotlib->pywaffle) (0.11.0) Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\isabe\anaconda\lib\site-packages (from matplotlib->pywaffle) (1.3.2) Requirement already satisfied: six>=1.5 in c:\users\isabe\anaconda\lib\site-packages (from python-dateutil>=2.7->matplotlib->pywaffle) (1.16.0)
import pandas as pd
from pandas_profiling import ProfileReport
import matplotlib.pyplot as plt
import numpy as np
import yellowbrick
import seaborn as sns
import warnings
import os
import scipy.cluster.hierarchy as shc
import matplotlib.patches as patches
from matplotlib.patches import Rectangle
from pywaffle import Waffle
from math import isnan
from math import isnan
from random import sample
from numpy.random import uniform
from sklearn.neighbors import NearestNeighbors
from sklearn.impute import KNNImputer
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans, DBSCAN, AgglomerativeClustering
from sklearn.metrics import davies_bouldin_score, silhouette_score, calinski_harabasz_score
from yellowbrick.cluster import KElbowVisualizer, SilhouetteVisualizer
from yellowbrick.style import set_palette
from yellowbrick.contrib.wrapper import wrap
import warnings
warnings.filterwarnings('ignore')
# --- Importing Dataset ---
df = pd.read_csv(r'customer-dataset\Customer_Data.csv')
# --- Reading Train Dataset ---
df.head()
| cust_id | balance | balance_frequency | purchases | oneoff_purchases | installments_purchases | cash_advance | purchases_frequency | oneoff_purchases_frequency | purchases_installments_frequency | cash_advance_frequency | cash_advance_trx | purchases_trx | credit_limit | payments | minimum_payments | prc_full_payment | tenure | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | C10001 | 40.900749 | 0.818182 | 95.40 | 0.00 | 95.4 | 0.000000 | 0.166667 | 0.000000 | 0.083333 | 0.000000 | 0 | 2 | 1000.0 | 201.802084 | 139.509787 | 0.000000 | 12 |
| 1 | C10002 | 3202.467416 | 0.909091 | 0.00 | 0.00 | 0.0 | 6442.945483 | 0.000000 | 0.000000 | 0.000000 | 0.250000 | 4 | 0 | 7000.0 | 4103.032597 | 1072.340217 | 0.222222 | 12 |
| 2 | C10003 | 2495.148862 | 1.000000 | 773.17 | 773.17 | 0.0 | 0.000000 | 1.000000 | 1.000000 | 0.000000 | 0.000000 | 0 | 12 | 7500.0 | 622.066742 | 627.284787 | 0.000000 | 12 |
| 3 | C10004 | 1666.670542 | 0.636364 | 1499.00 | 1499.00 | 0.0 | 205.788017 | 0.083333 | 0.083333 | 0.000000 | 0.083333 | 1 | 1 | 7500.0 | 0.000000 | NaN | 0.000000 | 12 |
| 4 | C10005 | 817.714335 | 1.000000 | 16.00 | 16.00 | 0.0 | 0.000000 | 0.083333 | 0.083333 | 0.000000 | 0.000000 | 0 | 1 | 1200.0 | 678.334763 | 244.791237 | 0.000000 | 12 |
ProfileReport(df, title="Pandas Profiling Report")
Summarize dataset: 0%| | 0/5 [00:00<?, ?it/s]
Generate report structure: 0%| | 0/1 [00:00<?, ?it/s]
Render HTML: 0%| | 0/1 [00:00<?, ?it/s]